Monte Carlo Tree Search in Imperfect-Information Games Doctoral Thesis
نویسنده
چکیده
Monte Carlo Tree Search (MCTS) is currently the most popular game playing algorithm for perfect-information extensive-form games. Its adaptation led, for example, to human expert level Go playing programs or substantial improvement of solvers for domain-independent automated planning. Inspired by this success, researchers started to adapt this technique also for imperfect-information games. Imperfectinformation games provide several difficulties not present in perfect-information games, such as the necessity to use randomized strategies to ensure an optimal play. Even though the properties of the optimal strategies in these games are wellstudied, MCTS literature on imperfect-information games does not build on this knowledge sufficiently. None of the pre-existing MCTS algorithms is known to converge to the optimal strategy in the game, even if they were given an infinite time for computation. In this thesis, we study MCTS in two-player zero-sum extensive-form games with imperfect information and focus on game-theoretic properties of the produced strategies. We proceed in two steps. We first analyze in detail one of the simplest classes of games with imperfect information: games with simultaneous moves. Afterwards, we proceed to fully generic imperfect-information games. We survey the existing MCTS algorithms for these classes of games and classify them to few fundamentally distinct classes. Furthermore, we provide the following contributions. First, we propose new MCTS algorithms that provably converge to Nash equilibrium of the game with increasing computation time. We introduce three such algorithms. One based on a minor modification of the standard MCTS template for simultaneous-move games and other two as an adaptation of the successful Monte Carlo Counterfactual Regret Minimization (MCCRF) to online search in both simultaneous-move and imperfect-information games. Second, we focus on improving the performance of MCTS algorithms, mainly by proposing and evaluating novel selection functions for choosing the actions to sample in the later iterations based on the statistics collected from the earlier iterations. In generic imperfect-information games, we propose explicit modelling of player’s beliefs about the probability of being in a specific game state during a match. Third, we perform an extensive evaluation of the proposed and existing MCTS methods on five simultaneous-move games and four fully imperfect-information games with variable size and fundamentally different properties. We evaluate both the ability of the algorithms to quickly approximate Nash equilibrium strategy and their performance in head-to-head tournaments. We show that the algorithms based on MCCFR have very a fast convergence to an equilibrium, but classical MCTS with the novel selection functions has superior performance in tournaments in large games. Finally, we present a case study of using MCTS for creating intelligent agents for a robotic visibility-based pursuit-evasion game. We design domain-specific variants of the previously introduced algorithms and evaluate their performance in a complex simulated environment. We show that the algorithms based on MCTS outperform the best previously known algorithm for this problem.
منابع مشابه
Search in Imperfect Information Games Using Online Monte Carlo Counterfactual Regret Minimization
Online search in games has always been a core interest of artificial intelligence. Advances made in search for perfect information games (such as Chess, Checkers, Go, and Backgammon) have led to AI capable of defeating the world’s top human experts. Search in imperfect information games (such as Poker, Bridge, and Skat) is significantly more challenging due to the complexities introduced by hid...
متن کاملOnline Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games
Online search in games has been a core interest of artificial intelligence. Search in imperfect information games (e.g., Poker, Bridge, Skat) is particularly challenging due to the complexities introduced by hidden information. In this paper, we present Online Outcome Sampling, an online search variant of Monte Carlo Counterfactual Regret Minimization, which preserves its convergence to Nash eq...
متن کاملA theoretical and empirical investigation of search in imperfect information games
We examine search algorithms for games with imperfect information. We rst investigate Monte Carlo sampling, showing that for very simple game trees the chance of nding an optimal strategy rapidly approaches zero as size of the tree increases. We identify the reasons for this sub-optimality, and show that the same problems occur in Bridge, a popular real-world imperfect information game. We then...
متن کاملMonte Carlo Tree Search for games with hidden information and uncertainty
Monte Carlo Tree Search (MCTS) is an AI technique that has been successfully applied to many deterministic games of perfect information, leading to large advances in a number of domains, such as Go and General Game Playing. Imperfect information games are less well studied in the field of AI despite being popular and of significant commercial interest, for example in the case of computer and mo...
متن کاملOmputation and D Ecision - M Aking in L Arge E Xtensive F Orm G Ames
In this thesis, we investigate the problem of decision-making in large two-player zero-sumgames using Monte Carlo sampling and regret minimization methods. We demonstrate fourmajor contributions. The first is Monte Carlo Counterfactual Regret Minimization (MC-CFR): a generic family of sample-based algorithms that compute near-optimal equilibriumstrategies. Secondly, we develop a...
متن کامل